Search CORE

50 research outputs found

Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces

Author: Duvenaud David
Hutter Frank
Osborne Michael A.
Snoek Jasper
Swersky Kevin
Publication venue
Publication date: 14/09/2014
Field of study

In practical Bayesian optimization, we must often search over structures with differing numbers of parameters. For instance, we may wish to search over neural network architectures with an unknown number of layers. To relate performance data gathered for different architectures, we define a new kernel for conditional parameter spaces that explicitly includes information about which parameters are relevant in a given structure. We show that this kernel improves model quality and Bayesian optimization results over several simpler baseline kernels.Comment: 6 pages, 3 figures. Appeared in the NIPS 2013 workshop on Bayesian optimizatio

arXiv.org e-Print Archive

CiteSeerX

Efficient Feature Learning Using Perturb-and-MAP

Author: Ke Li
Kevin Swersky
Richard Zemel
Publication venue
Publication date
Field of study

Perturb-and-MAP [1] is a technique for efficiently drawing approximate samples from discrete probabilistic graphical models. These samples are useful for both characterizing the uncertainty in the model, as well as learning its parameters. In this work, we show that this same technique is effective at learning features from images using graphical models with complex dependencies between variables. In particular, we apply this technique in order to learn the parameters of a latentvariable model, the restricted Boltzmann machine, with additional higher-order potentials. We also use it in a bipartite matching model to learn features that are specifically tailored to tracking image patches in video sequences. Our final contribution is the proposal of a novel method for generating perturbations.

CiteSeerX

Learning unbiased features

Author: Li Yujia
Swersky Kevin
Zemel Richard
Publication venue
Publication date: 16/12/2014
Field of study

A key element in transfer learning is representation learning; if representations can be developed that expose the relevant factors underlying the data, then new tasks and domains can be learned readily based on mappings of these salient factors. We propose that an important aim for these representations are to be unbiased. Different forms of representation learning can be derived from alternative definitions of unwanted bias, e.g., bias to particular tasks, domains, or irrelevant underlying data dimensions. One very useful approach to estimating the amount of bias in a representation comes from maximum mean discrepancy (MMD) [5], a measure of distance between probability distributions. We are not the first to suggest that MMD can be a useful criterion in developing representations that apply across multiple domains or tasks [1]. However, in this paper we describe a number of novel applications of this criterion that we have devised, all based on the idea of developing unbiased representations. These formulations include: a standard domain adaptation framework; a method of learning invariant representations; an approach based on noise-insensitive autoencoders; and a novel form of generative model.Comment: Published in NIPS 2014 Workshop on Transfer and Multitask Learning, see http://nips.cc/Conferences/2014/Program/event.php?ID=428

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Multi-Task Bayesian Optimization

Author: Adams Ryan Prescott
Snoek Jasper
Swersky Kevin
Publication venue: Curran Associates, Inc.
Publication date: 25/07/2014
Field of study

Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up

k

-fold cross-validation. Lastly, our most significant contribution is an adaptation of a recently proposed acquisition function, entropy search, to the cost-sensitive and multi-task settings. We demonstrate the utility of this new acquisition function by utilizing a small dataset in order to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost.Engineering and Applied Science

Harvard University - DASH

Learning Hard Alignments with Variational Inference

Author: Chiu Chung-Cheng
Jaitly Navdeep
Lawson Dieterich
Raffel Colin
Swersky Kevin
Tucker George
Publication venue
Publication date: 01/11/2017
Field of study

There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition. Hard attention can offer benefits over soft attention such as decreased computational cost, but training hard attention models can be difficult because of the discrete latent variables they introduce. Previous work used REINFORCE and Q-learning to approach these issues, but those methods can provide high-variance gradient estimates and be slow to train. In this paper, we tackle the problem of learning hard attention for a sequential task using variational inference methods, specifically the recently introduced VIMCO and NVIL. Furthermore, we propose a novel baseline that adapts VIMCO to this setting. We demonstrate our method on a phoneme recognition task in clean and noisy environments and show that our method outperforms REINFORCE, with the difference being greater for a more complicated task

arXiv.org e-Print Archive

Crossref